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Abstract: Genome-Wide Association Studies (GWAS), 
whole genome sequencing, and high-throughput omics 
techniques have generated vast amounts of genotypic 
and molecular phenotypic data. However, these data have 
not yet been fully explored to improve the effectiveness 
and efficiency of drug discovery, which continues along a 
one-drug-one-target-one-disease paradigm. As a partial 
consequence, both the cost to launch a new drug and the 
attrition rate are increasing. Systems pharmacology and 
pharmacogenomics are emerging to exploit the available 
data and potentially reverse this trend, but, as we argue 
here, more is needed. To understand the impact of 
genetic, epigenetic, and environmental factors on drug 
action, we must study the structural energetics and 
dynamics of molecular interactions in the context of the 
whole human genome and interactome. Such an ap- 
proach requires an integrative modeling framework for 
drug action that leverages advances in data-driven 
statistical modeling and mechanism-based multiscale 
modeling and transforms heterogeneous data from 
GWAS, high-throughput sequencing, structural genomics, 
functional genomics, and chemical genomics into unified 
knowledge. This is not a small task, but, as reviewed here, 
progress is being made towards the final goal of 
personalized medicines for the treatment of complex 
diseases. 



Introduction 

Drug discovery, as broadly practiced, suffers from several 
shortcomings. First, although the vast amounts of genotypic and 
molecular phenotypic data generated from Genome-Wide Asso- 
ciation Studies (GWAS); whole genome sequencing (WGS) [1]; 
and high-throughput techniques such as RNA-seq [2], ChlP-seq 
[3], BS-seq [4], and DNase-seq [5] provide an unprecedented 
opportunity to understand the etiology of complex diseases and to 
discover safe and potent personalized medicines, to date these data 
have not been fully explored to improve the effectiveness and 
efficiency of drug discovery. Second, modern target-based drug 
discovery is characterized as a one-drug-one-gene paradigm, and 
has been of limited success in attacking complex diseases. Third, 
phenotypic screens and cell-based assays generate a large number 
of active compounds relevant to disease treatment, but give few 
hints as to what their molecular targets are [6-9]. As a result of 
these shortcomings, the cost to launch a new drug is typically more 
than US$1 billion, and that cost continues to increase, with only 
around one-third of drugs in phase III clinical trials reaching the 
market. The emerging field of systems pharmacology is addressing 



these shortcomings and beginning to change the way we think 
about drug action in multigenic, complex diseases [10-15]. 

As illustrated in Figure 1, a drug commonly not only interacts 
with its intended molecular target (on-target) but also binds to and 
affects other targets (off-targets) that are often unknown [16]. Each 
drug-target interaction modifies the conformational dynamics of 
the target structure and results in the alternation of the functional 
states (e.g., activation versus inhibition). Consequently, the 
changing conformational and functional states of both on-targets 
and off-targets directly or indirectly affects other molecular 
components and their interactions through the interplay of 
complex signal transduction, gene regulation, and metabolic 
networks that collectively mediate the system-level response to 
the drug, leading to either therapeutic or adverse effects [10]. A 
variety of genetic, epigenetic, and environmental factors define the 
initial pathophysiological state of the molecular components and 
their interactions, which then dynamically evolve when perturbed 
by a drug. Stated another way, both target- and non-target 
associated genetic and/ or epigenetic alternations could impact the 
drug response. In addition to inherited genetic and/ or epigenetic 
factors, cellular, tissue, and organism environments may have 
significant effects on drug efficacy and side effects [17-21]. For 
example, tumor-stromal interactions play key roles in anticancer 
drug sensitivity [22]. 

The underlying hierarchical organization of living organisms 
makes it essential to model drug actions from DNA to gene, to 
protein and its molecular ensemble, to cell, to tissue, to organ, to 
whole organism, and to population. Data-driven, network-based 
association studies and physical- or mathematical-based multiscale 
modeling are two pillars of the existing paradigm of systems 
pharmacology. Network-based association studies provide a 
promising avenue to realize personalized medicine. The 
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Figure 1 . A network view of drug action. Dark blue lines represent 
drug-target interactions. Green arrows are protein-protein interactions 
or biological reaction pathways. Yellow nodes represent genes affected 
by genetic variation. These variations will impact drug action by 
changing the information flow of drug-target interactions in the 
biological network, even when these genes are not themselves the 
direct drug targets. 
doi:1 0.1 371/joumal.pcbi.l 003554.g001 

reconstruction and analysis of genome-scale molecular interaction 
networks, including protein-protein interactions; protein-nucleic 
acid interactions; epistasis interactions, as found in signal 
transduction; gene regulation; and metabolic networks, have 
emerged as a powerful framework to integrate heterogeneous 
DNA variation and omics data in associating genotypes with 
phenotypes under various environmental and drug-induced 
conditions [16]. By taking advantage of the progress in network 
and systems biology, mechanism-based multiscale modeling that 
spans different temporal and spatial scales has already been able to 
predict genotype-phenotype associations in a whole cell model of 
Mycoplasma genitalium [23] and quantitatively simulate drug actions 
at the organism level [24]. 

However, several challenges remain in the application of 
systems pharmacology. First, data-driven network-based associa- 
tion studies primarily rely on sophisticated statistical techniques. 
Although great efforts have been made to address the n«p 
problem, where the number of observations n is much smaller than 
the number of variables or parameters p, the power of these 
statistics-based techniques remains limited if sample sizes are 
small. The "causal" relationships inferred from these methods are 
simply mathematical correlations. They may not provide biolog- 
ical insight into the underlying molecular mechanisms that enable 
the development of actionable models for understanding the drug- 
response phenotype. Second, the existing paradigm of multiscale 
modeling is to isolate subsystems (or parameters) from a phenotype 
space. These subsystems are first studied independently and then 
combined to infer their synergistic behavior. It is noted that the 
subsystem itself can be considered as a phenotype. Thus, this 
isolation/combination process could be multilevel and recursive. 
However, current organism-level physiological models are often 
too complex to be supported by existing data and computational 
power. The mathematical model is often nonidentifiable; that is, 
there is not a unique parameter set to explain the experimental 
observations [25]. On the other hand, physical models are often 
too computationally intensive to readily model the global behavior 
of the physiological system. Fundamentally, isolated parameter 
space may be not sufficient to "identify and elucidate the guiding 
principles of control and communication defining the behavior of 
an organism" [11]. Such a guiding principle is fundamental to 



reliably predict human behavior by scaling up animal models. 
Third, the enormous investment in molecular libraries and target-, 
cell-, and organism-based high-throughput compound screening 
has generated a massive amount of chemical genomics data [26- 
28]. There is no doubt that these data are invaluable in 
understanding how drugs work at the molecular, cellular, and 
organismal levels. However, this arguably most important dataset 
for systems pharmacology has not been fully incorporated into 
either the network-based association studies or multiscale model- 
ing frameworks, partly due to a lack of computational tools to map 
bioactive chemical space to its global target and pharmacological 
space. Lastly, it has been recognized that one of the critical hurdles 
in multiomics data integration and multiscale modeling is the lack 
of a common language and standard to annotate, exchange, reuse, 
and update computational models [29-31]. Due to the dynamic, 
complex, and multiscale nature of datasets and computational 
models needed to simulate a drug response under diverse genetic, 
epigenetic, and environmental conditions, an open and reusable 
conceptual framework that is able to link multilevel biological 
concepts and relationships is needed to realize the promise of 
community-driven predictive modeling of human physiology and 
pathology [32-35]. 

Although macromolecular structure is at the foundation of any 
molecular interaction, adding the structural and associated 
energetics and dynamics of the interplay between drugs, biomo- 
lecular targets, genetic and epigenetic variations, and environ- 
mental factors has not been fully exploited in systems pharmacol- 
ogy to date. A global three-dimensional macromolecular structure 
view of the biological system under study may offer new insights to 
address the aforementioned challenges. Stated more explicitly, a 
mechanistic understanding of how individual molecular compo- 
nents work together in a system and how the molecular 
interactions are affected and adapted to genetic and epigenetic 
variants and environmental perturbations requires knowledge of 
the underlying molecular structures and their conformational 
dynamics [36] . The information derived from the atomic details of 
molecular interactions, in principle, will enhance the power of 
statistical inference in data-driven systems biology and alleviate the 
current inability to fully characterize parameter space in 
mathematical modeling, revealing the guiding principles of 
systematic control and communication. Moreover, as a bridge to 
connect chemical and genomics space, macromolecular structure 
will allow us to link drugs, targets, and biological pathways, 
thereby providing a common framework to correlate molecular 
interactions with cellular functions. Leveraging the vast investment in 
chemical genomics, functional genomics, structural genomics, and structure- 
based drug discovery, together with efforts in systems pharmacology, may open a 
new door to developing personalized medicines for complex diseases. Thus, 
here we advocate and then justify a new paradigm of structural 
systems pharmacology. Structural systems pharmacology will 
model, on a genome scale, the energetic and dynamic modifica- 
tions of macromolecules (proteins, RNA, DNA) by drugs. The 
modeling accounts for genetic/ epigenetic and environmental 
factors as well as the subsequent collective effects on the 
information flow in biological systems. 

Some advances have been made in incorporating macromolec- 
ular structure modeling into systems pharmacology. We review 
them in this article. We demonstrate that integrative modeling of 
drug action — from the structural and energetic basis of genome- 
wide molecular interactions to the clinical outcomes at the 
organism level — provides new insights into both therapeutic 
effects and side effects while taking into account genetic 
differences. In terms of scope, we first propose a hybrid modeling 
approach to integrate mechanism-based multiscale modeling and 
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simulation with a data-driven systems biology approach and 
suggest that macromolecular structure is an essential component to 
glue diverse technologies together into a unified framework. We 
demonstrate the potential of macromolecular structures in 
enhancing the capability of systems biology through enriching 
the connectivity of context-specific biological networks and 
resolving non-identifiable parameter space during network simu- 
lation. Then we focus on three aspects of structural systems 
pharmacology that link genetic events and drug-target interac- 
tions to the drug response phenotype. First, we focus on traditional 
pharmacodynamics, now in the context of structural systems 
pharmacology. Second, and in a similar fashion, we focus on 
traditional pharmacokinetics in the context of structural systems 
pharmacology. Third, we explore the role of structural systems 
pharmacology to enhance the power of pharmacogenomics and 
GWAS. Thus, this review complements several recent reviews that 
focus on a network view of systems pharmacology and its 
connection to phenotype [10-15]. Physical-based multiscale 
modeling will not be covered in detail, since this is presented 
elsewhere [37-40]. Ultimately, we argue, structural systems 
pharmacology should be incorporated into the modeling and 
simulation of macromolecular ensembles, tissues, and organisms. 

Structural Systems Pharmacology — Structure- 
Enabled Integrative Modeling of Drug Action 

As stated in the introduction, several challenges remain in 
systems pharmacology, limiting the ability to predictively model 
complex drug action under the influence of diverse genetic, 
epigenetic, and environmental factors. To address these challeng- 
es, we suggest an integrative structural systems pharmacology 
approach to understanding and predicting individual- and 
context-specific drug response phenotypes. In this proposed 
modeling framework, macromolecular structure is an indispens- 
able component to link chemical space to genomic space, to 
associate genotypes with phenotypes, and to account for the 
environmental impact on biological systems. To demonstrate the 
proposed model, we use the predictive modeling of drug-induced 
arrhythmia as an example (Figure 2). Drug-induced arrhythmia is 
a potentially life-threatening side effect that is a major concern in 
clinical trials. QT interval prolongation that can be measured by 
electrocardiogram (ECG) waves has been widely accepted as a 
biomarker for arrhythmia. Both multiscale physical and mathe- 
matical models [41-42] and network-based predictive models [43] 
have been developed to predict QT interval prolongation with the 
aim of predicting the drug side effect of arrhythmia at an early 
stage (blue-colored box in Figure 2). However, several key 
components are missing in these models. As a result, their 
prediction power is limited. Given a new or existing drug, we at 
least need to address the following issues in order to predict 
whether or not the drug may induce arrhythmia under a specific 
physiological context for a specific individual (Figure 2): 

(1) Identification of genome-wide drug-target interactions. The 
QT interval prolongation involves not only multiple ion 
channels (e.g., hERG, Kv7.1, Navl.5, and Cavl.2) but also 
multiple other genes that are functionally associated with the 
ion channel [43] . In addition, the interaction of a drug with 
metabolizing enzymes and regulatory genes may alter the 
concentrations of proteins that play roles in arrhythmia and 
the pharmacokinetic profile of the drug molecule. 

(2) Conformational dynamics and energetics of multiple ion 
channels under drug and genetic perturbation. The dynamic 
change of ion channel conformations (open and closed) during 



gating is the primary determinant of the membrane current 
during the action potential. Both drug binding and unbinding 
kinetics, as well as amino acid mutations, may impact the 
conformational change of the ion channel, leading to the 
change of action potential. The events of conformational 
dynamics can be modeled by Molecular Dynamics (MD) 
simulation. As genome-scale MD simulation is not feasible at 
this time, evolutionary and functional constraints that could 
be derived from sequencing and multiple omics data will be 
an invaluable asset to significantly reduce the conformational 
sampling space of MD simulation [44] . 

(3) Determination of the in vivo concentrations of relevant drugs 
and metabolites (e.g., ebastine [45]) that may affect the 
activity of ion channels. In principle, this could be achieved 
using a whole-body physiologically based pharmacokinetics 
(PBPK) model that incorporates tissue-specific, genome-scale 
metabolic models. However, to date, little attention has been 
paid to adjusting PBPK models using information from 
genome-wide drug-target interactions. 

(4) Identification of individual- and context-specific parameter 
spaces for PBPK and systems biology models. To predict 
individual- and context-specific (e.g., normal tissue versus 
inflamed tissue) drug responses, it is critical to define 
molecular states, network architectures, and dynamic param- 
eters at the molecular level under the physiological conditions 
that exist during drug treatment. Although a vast amount of 
GWAS and multiple types of omics data provide abundant 
opportunities for this purpose, these data have not been fully 
explored to define biological networks at molecular resolution. 

(5) Most importantly, it is necessary to integrate the above 
information into a coherent computational model across 
temporal and spatial scales. As these tasks traditionally span 
multiple disciplines and require different techniques, such as 
statistical machine learning [46], MD simulation, Ordinary 
Differential Equation (ODE)-based kinetics simulation, con- 
straint-based modeling, discrete logic models, etc., they may 
need to be implemented as independent functional modules 
and subsequently assembled into a complete framework. 
Consequently, these functional modules need clearly defined 
interfaces and metadata for bi-directional communication. 
For example, the PBPK model determines the fate of drug 
molecules. In turn, drug-target interactions may regulate the 
expression level of CYP450 (as detailed later), thus altering the 
parameter space of the PBPK model. It has been suggested 
that ontology-driven, rule-based modeling may facilitate the 
integrative modeling of drug actions [1 1] . The integration of 
rule-based semantic modeling and Bayesian statistical mod- 
eling [47-49], which can establish cause-effect relationships 
across temporal and spatial scales, could be a useful tool in 
combining diverse techniques and multiple sequencing, 
molecular, and omics data. Such a scheme is depicted in 
Figure 3. It combines information and biological knowledge 
from DNA variants and their associated genes; drug-target 
interactions; protein conformational states; biological path- 
ways; cellular networks, such as protein-protein interaction 
networks; molecular phenotypes, such as gene expression 
profiles; and different organism phenotypes. Using these 
individualized drug response phenotypes, the probability of 
causal mutations, involved targets, conformational states, 
molecular complexes or functional modules, and biological 
pathways and their links can be established by a priori 
knowledge from mechanism-based modeling or estimated 
using a Bayesian statistical framework. 



PLOS Computational Biology | www.ploscompbiol.org 



3 



May 2014 | Volume 10 | Issue 5 | e1 003554 



; h 

; input: " s ~^ 








drug 


chemical space biomolecule 


genotype 


molecular phenotype j 


* 



structure-enabled integrative modeling of drug action 



genome-wide 
(ill., drug-target interaction 

Ha 

I; r 



whole-body PBPK 




GWAS 



♦ • • • • 



I 



Molecular Dynamics 
simulation 



tissue-specific 
metabolic model 




W hERG 




* 



I 



systems biology 



gene 
regulation 



signal 
transduction 



PPI 
network 



AP cell 
model 



ventricular 
tissue model 




Multiscale Ventricular Electrophysiological Model 



output : 
potential 
arrhythmia 



J 



Figure 2. A structure-enabled integrative framework to model drug action. Given a set of inputs — a new or existing drug, known bioactive 
chemical space, the whole human proteome, an individual's genotypic data, and context-specific phenotypic data — it is possible, in principle, to 
construct a structure-enabled integrative model of drug action. Such a model comprises multiple integrated functional modules (rounded boxes) that 
span multiple levels of biological organization and can be used to infer drug-induced arrhythmia. Solid and open arrows indicate current workflows 
and missing links, respectively. Blue boxes represent two existing methods: multiscale ventricular electrophysiological modeling [41 -42] and protein- 
protein interaction (PPI) network-based predictive modeling [43] for the prediction of drug-induced arrhythmia represented as a pseudo 
electrocardiography (ECG). The other boxes represent functional modules that are critically important but have not been fully developed or 
incorporated into the modeling process. 
doi:10.1371/journal.pcbi.1003554.g002 



Although we use the arrhythmia example for illustrative 
purposes, the framework described in Figure 2 should be 
generakzable for the understanding and prediction of drug response 
phenotypes. As detailed in the remaining text, we will show that 
macromolecular structures play critical roles in reconstructing 
genome-scale, high-resolution molecular interaction models, simu- 
lating the conformational dynamics of drug-target complexes, 
enabling context-specific pharmacokinetics modeling, resolving 
nonidentifiable parameters in mathematical modeling, and enhanc- 
ing the predictive power of network-based association studies. Thus, 
the structure-enabled integrative modeling of drug actions may 
facilitate transforming conventional drug discovery process to a new 
paradigm based on systems pharmacology. 

Molecular Resolution of the Biological Network 
and Its Parameter Space 

Reconstruction of individual- and context-specific genome-scale 
biological networks (e.g., drug-target, protein-protein interaction 



(PPI), metabolism, gene regulation, and signal transduction) is the 
foundation of systems pharmacology. Concurrently, protein 
structure-based PPI networks [50-55] have already made signif- 
icant contributions to reliably expanding genome-scale PPIs [5 1— 
52], understanding the molecular mechanism of signal transduc- 
tion [56-57], revealing the evolutionary origin of pathogen-host 
interactions [58], elucidating the molecular basis of disease 
mutations [59], and designing novel molecular therapeutics to 
target the network, PPI interfaces, and allosteric modulation, as 
summarized by Duren-Frigola et al. [60]. 

When kinetic parameters are lacking, constraint-based Flux 
Balance Analysis (FBA) presents an alternative approach to 
compute the phenotypic properties of whole cells, especially 
genome-scale metabolic networks [61]. New biological insights 
have been gained when incorporating protein structural informa- 
tion into metabolic network modeling of bacteria, which cannot be 
achieved by FBA alone [62-63]. For example, structure-based 
reconstruction of a genome-wide metabolic network makes it 
possible to determine bacteria growth in response to temperature 
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Figure 3. Hierarchical cause-effect semantic modeling to 
understand and predict drug action across temporal and 
spatial scales by using diverse techniques and integrating 
multiple sequencing, molecular, and omics data. Arrowed edges 
represent cause-effect relationships between biological entities: genetic 
variation, ligand (allosteric or orthosteric), drug target, conformational 
state of the drug target, biological pathway, molecular phenotypes 
from multiple omics data, integrated biological network, and organism 
phenotype (e.g., disease). The thickness of the arrow indicates the 
degree of probability. And the + and - signs represent positive (or 
activated) and negative (or inhibited) regulation, respectively. For 
example, an allosteric ligand may interact with target ! to induce its 
active conformation that positively regulates pathway /. The positively 
regulated pathway 1 can be derived from an observed molecular 
phenotype / (e.g., gene expression profile). A context-specific biological 
network can be inferred by integrating multiple molecular phenotypes 
and be used to understand and predict an organismal phenotype. 
doi:1 0.1 371 /journal.pcbi.1 003554.g003 



changes [64] . This opens a new door to understanding the impact 
of the environment on drug action. It is noted that when protein 
structures are used, they are often treated as a single chain or as 
simply forming binary interactions during network analysis. In 
reality, under physiological conditions, proteins perform their 
functions through biological assemblies that may consist of 
multiple proteins. In a recent study, the 3-D structure of biological 
assemblies has been explicitly considered in the context of the 
genome-scale metabolic network. Novel drug targets and thera- 
peutics are expected to be identified through such an integrative 
modeling strategy [65]. Beyond microorganisms, the first recon- 
struction of a genome-scale human metabolic network (Recon-1) 
by Duarte et al. provided the foundation for applying FBA to 
complex disease modeling [66]. By taking advantage of the rapidly 
increasing omics data, new methods have been developed to 
model cell-specific [67], tissue-specific [68-69], and context- 
specific [70-71] human metabolic networks. 

Several recent efforts have made progress in reconstructing 
genome-scale high-resolution protein-chemical interaction models 
[72-76]. As shown in Figure 4A, the targets identified from 
chemical genomics and functional genomics data analysis mainly 
include existing known drug targets and their homologs, which 



only cover a small portion (~8%) of the human genome [77-95]. 
A large number of proteins whose cognate or designed ligands are 
less characterized (or unknown) or who have low-affinity bindings 
to drugs are very likely to be important or even critical to 
pathophysiological processes under consideration [16]. The target 
space can be significantly extended to ~50% of human genes 
using structural genomics data [96]. As illustrated in Figure 4B, 
when integrating chemical genomics data analysis and molecular 
modeling on a structural genome scale, it is possible not only to 
greatly extend the existing target space to ~50% of human and 
pathogen genomes (a five to 50 times increase over existing 
targets), but also to construct genome-wide high-resolution 
protein-chemical interaction models for millions of bioactive 
compounds [72,97]. Although it remains a challenge to accurately 
determine, under physiological conditions, the binding affinity and 
binding/ unbinding kinetics of these interaction models, these 
models provide a basis to simulate the conformational dynamics of 
protein targets perturbed by a drug (see details in next sections) 
[44,98]. Consequently, the physiological drug response (therapeu- 
tic effect or side effect) can be predicted by mapping the 
conformational states of drug targets into biological pathways 
and networks [44,98] . We expect that the integration of chemical 
genomics, structural genomics, and functional genomics will 
significandy enhance the capability of systems pharmacology for 
molecular target identification of bioactive compounds, drug 
repurposing, polypharmacological drug design, and side effect 
prediction. 

In the context of personalized medicine in the treatment of 
complex diseases, a critically important but less-addressed problem 
is the need to reconstruct genome-scale, structure-based gene 
regulatory networks. In the major groove of DNA, every base pair 
has a unique hydrogen-bonding signature, and the "direct 
readout" mechanism, in which the formation of a series of amino 
acid base-specific hydrogen bonds contributes to the protein-DNA 
binding specificity, has been commonly accepted [99]. Recently, 
Rohs et al. found that the binding of arginine residues to narrow 
minor grooves is a widely used mode for protein-DNA recognition 
[100-101]. Differing from the "direct readout" mechanism, their 
findings indicate that the minor groove can also provide 
information, such as local variations in DNA shape and 
electrostatic potential for protein-DNA recognition, and offer 
new insights into the structural and energetic origins of protein- 
DNA binding specificity. These studies highlight the importance of 
the role of macromolecular structures in understanding gene 
regulation. 

The reconstruction of the biological network is only the first step 
in understanding the dynamic and stochastic nature of cellular 
processes [102]. +When the network topology and kinetic 
parameters are defined, time-dependent deterministic functions, 
including Ordinary Differential Equations (ODEs) and Partial 
Differential Equations (PDEs) [103], are commonly employed to 
analyze the dynamics of signaling and metabolic pathways upon 
drug treatment [104]. For instance, ODEs have been successfully 
applied to investigate the effects of multitarget inhibition [105— 
106], to search for optimal target combinations for safe and 
effective anti-inflammation therapy [107], and to predict the 
network response to target inhibition [108]. PDEs can be used to 
model the concentration change of each component as a function 
of both time and space [103]. Such a technique is important in 
determining the effective concentration of a drug when it reaches 
its targets and off-targets in an essentially nonlinear, non- 
equilibrium cellular and microenvironment. Stochasticity is one 
of the fundamental properties of cellular processes [102]. 
Moreover, in vivo drug action involves a series of stochastic 
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Figure 4. Reconstruction of genome-wide, high-resolution protein-chemical interaction networks. (A) Distribution of existing drug 
targets, PDB structures, and homology models in the human genome. (B) A schema to reconstruct 3-D drug-target interaction networks by 
integrating chemical genomics, structural genomics, and functional genomics. Novel drug off-targets could be identified by using the drug-target 
interaction models from chemical genomics analysis, and followed by searching for entire human or pathogen structural genome. In addition to 
sequence and global structural comparison, ligand binding site comparison is a valuable method, as it can identify binding promiscuity across fold 
space [119-121,231-238]. After putative off-targets have been identified from structural genomics analysis, sophisticated molecular modeling 
techniques such as protein-ligand docking and Molecular Dynamics (MD) simulation can be applied to determine high-resolution interaction models 
and their binding affinity and conformational space. To correlate drug-target interactions with their physiological response, the conformational state 
of the drug— target complex can be mapped to biological pathways, integrated networks, and physiological models. Several examples are shown in 
the figure. Semantic-based modeling is able to establish cause-effect from drug to target to pathways and, ultimately, to clinical outcomes [98]. 
Biological pathway analysis will provide the mechanistic understanding of information flow caused by drug modulation [44]. Critical components and 
interactions involved in drug modulation can be identified through integrated protein-protein interaction (PPI) network analysis [21 2], Here, blue and 
green nodes represent drug targets and genes with observable changes, respectively. The target inhibition or activation along with genetic 
perturbations can be simulated using reconstructed physiological models [71]. In turn, the information from pathway and network analysis can be 
used to verify or falsify the drug-target interaction models and to constrain their conformational space. 
doi:10.1371/journal.pcbi.1003554.g004 



processes such as prodrug transport and efflux [109]. Thus, 
stochastic models will be important tools in modeling drug action 
[110]. 

However, dynamic modeling is hampered by the lack of reliable 
kinetic parameters. In many cases, the kinetic parameters for 
enzyme reactions can be estimated from protein structures [111]. 
The computational techniques required are dependent on the 
reaction mechanism. Quantum Mechanics (QM) or Quantum 
Mechanics/Molecular Mechanics (QM/MM) is needed if bond- 
breaking is the rate-limiting step. Whereas, Brownian dynamics is 
more efficient if the reaction is fast and the diffusion rate is the 
determinant. Nevertheless, these computational techniques are 
time-consuming and cannot be easily extended to a genome scale. 



It has been proposed that the electrostatic potential in the active 
site determines not only the stability of transition states [112] but 
also the diffusion association rate [113]. Thus, it is proposed that 
the comparison of the Molecular Interaction Field (MIF), 
including the electrostatic potentials and other physical charac- 
teristics of structurally similar proteins, may assist in the estimation 
of the kinetic parameters [1 14] . A similar strategy has been applied 
to predict the association/disassociation rate of protein-protein 
interactions [115—117], which are essential for the dynamic 
modeling of signal transduction pathways. Furthermore, it is 
possible to extend the scope of MIF to a structural proteome scale 
for structurally unrelated proteins by analyzing and comparing 
the evolutionary, dynamical, physiochemical, geometric, and 
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transition-state binding properties of the binding interfaces [118- 
122]. The use of estimated kinetics parameters is particularly 
promising when coupled with a kinetic hybrid model [123]. In 
these models, detailed rate equations are only used to describe 
essential enzymes. Simplified and approximate rate equations are 
applied to the majority of enzymes. A recent study has shown that 
37% of enzymes in Escherichia coll are promiscuous and evolution- 
arily retained and catalyze 65% of known metabolic reactions 
[124]. With higher catalytic capability, specific enzymes tend to be 
essential and more frequendy coupled with gene regulation than 
promiscuous enzymes. This finding provides additional support for 
the structure-based hybrid model. It is noted that the kinetic 
parameters are often experimentally measured under three- 
dimensional conditions, which do not reflect the two-dimensional 
dynamic processes occurring when a drug binds to a receptor. Wu 
et al. presented a theoretical multiscale simulation approach that 
converts three-dimensional affinities to two dimensions, account- 
ing directly for the structure and dynamics of the membrane- 
bound molecules [125]. In summary, multiscale mathematical 
modeling and network-based association studies in systems 
pharmacology will benefit from the information derived from 
macromolecular structures in terms of the identification of reliable 
network connectivity as well as the enrichment of individual- and 
context-specific kinetic parameters. 

Pharmacodynamics in the Era of Structural Systems 
Pharmacology 

A major focus of pharmacodynamics is to quantitatively 
understand drug-target interactions and their effects on the whole 
organism. It is clear that drug action cannot be fully understood by 
a conventional one-drug-one-target paradigm. A systematic view 
of all proteome drug-target interactions is often necessary [16]. A 
drug molecule may act more than to inhibit or activate a target in 
a binary manner. Recent studies in biased agonism (or biased 
signaling, functional selectivity) [126] and partial agonism add a 
new dimension to understanding pharmacodynamics. For exam- 
ple, it has been recognized that a G-protein coupled receptor 
(GPCRs) pleiotropically regulates multiple signaling pathways. An 
endogenous or designed agonist for a GPCR may selectively 
activate one of its regulated pathways, leading to therapeutic 
efficacy or, alternatively, to unwanted side effects. At the 
molecular level, biased agonism originates from the selection of 
specific conformational states of the target protein, which are 
dynamically coupled with ligand binding. Fundamentally, the 
molecular mechanism of biased agonism may be similar to that of 
partial agonism observed in nuclear receptors (NRs). The 
transcriptional activity modulated by NR agonists is not dependent 
on the binding affinity but rather the ensemble of both protein 
conformations and ligand orientations [127-128]. Moreover, 
allosteric interaction can shift the conformational ensembles, 
thereby modulating the activity of agonist binding [36] . 

These findings provide both new opportunities and impose 
challenges in linking in vitro drug binding with associated in vivo 
activity. In addition to identifying proteome-wide drug binding 
promiscuity and specificity, it is necessary to sample the 
conformational ensemble associated with these drug-target 
interactions and to link their conformational state with biological 
pathways. Although a number of state-of-the-art computational 
techniques (e.g., those reviewed in [16] and Figure 4) are able to 
predict drug binding cross-reactivity, few of them provide a high- 
resolution landscape for the complete conformational space of 
drug-target interactions. State-of-the-art methods accounting for 
conformational flexibility are not capable of mapping the 
conformational ensembles to the signaling pathways that they 



modulate [129]. New concepts and techniques are needed to 
include the influence of protein dynamics on functional activity of 
drug binding in the context of biological networks. 

At the molecular level, conventional single-target, single-state 
virtual screening and quantitative structure-activity relationships 
(QSAR) should be extended to a multitarget, multiconformation 
model. A wide array of experimental techniques, such as 
fluorescence spectroscopy [130], plasmon waveguide resonance 
spectroscopy [131], bioluminescence resonance energy transfer 
[132], circular dichroism [133], X-ray crystallography [134], site- 
directed mutagenesis [135], and 19F-NMR spectroscopy [136], 
have been developed to determine active conformations associated 
with biased and partial agonism. These data provide a foundation 
for developing multistate models of pharmacodynamics. Structure- 
based molecular modeling may play a critical role in understand- 
ing the biased signaling. For example, dynamic protein-ligand 
homology modeling coupled with site-directed mutagenesis data is 
used to determine the dimerization and activation models for 
GPCRs [137-138]. 

In the context of personalized medicine, the functional selectivity 
of drug binding can be modulated by mutations in the protein 
target, which directly impact orthosteric or allosteric interactions. 
Thus, it is important to assess the importance of the mutation on the 
energetics and dynamics of proteome-scale drug-target interactions. 
Protein structure may provide critical insights into how the 
mutation alters the drug response. The genetic predisposition to 
adverse drug reactions (ADR) can be rationalized through the 
atomic details of the interaction between the drug and its potential 
off-target using structural modeling. For example, Li et al. have 
discovered that an R41Q_ mutation in human cytosolic sialidase 
(HsNEU2), which is predisposed in a small portion of the Asian 
population, links ADRs to oseltamivir (Tamiflu) [139]. In addition 
to mutations that direcfly involve drug binding, the mutations that 
disrupt allosteric interactions are some of the major determinants of 
disease but have been little studied [140]. Several structure-based 
techniques have been developed to predict the effect of mutations 
on allosteric regulation. They include correlated mutation analysis 
[141-146], the detection of pairwise dynamic [147-148] or 
energetic coupling [149] between residues, and analysis of the 
global topology of protein structures [150-151]. These methods 
may eventually contribute to the design of allosteric drugs [36,152]. 

Dynamic simulation of drug unbinding kinetics is another area 
that may significantly impact personalized medicine. Drug-target 
interactions in vivo are different from those in vitro. In target or cell- 
based assays, the concentrations of both drug and target are fixed 
and the binding affinity is measured by thermodynamic equilibrium 
constants such as IC 5 o values, which reflect binding potency. 
However, in a living organism, the concentration of the drug, the 
target, and the other molecules constandy changes with time, rarely 
reaching equilibrium. Thus, the drug binding affinity is not an 
appropriate indicator of drug potency in vivo [153]. An increasing 
body of evidence suggests that drug efficacy correlates more strongly 
with drug-target residence time than with binding affinity [154 — 
157] . Long residence time can lead to sustained pharmacological 
effect and may also alleviate off-target toxicity. The residence time 
of a drug on its target can be gready influenced by conformational 
adaptation [158]. Recent studies suggest that the in vivo duration of 
drug efficacy not only depends on macroscopic pharmacokinetic 
properties like plasma half-life and the time needed to equilibrate 
between the plasma and the effect compartments, but is also 
influenced by long-lasting target binding and rebinding [159]. 

Experimental approaches to studying drug binding and 
unbinding to proteins have limitations in temporal and spatial 
resolution. It was reported recently that a computational network 
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analysis combined with explicit water MD simulations of the 
unbinding of small inhibitors from the enzyme FK506 Binding 
Protein (FKBP) provided a clear picture of the free energy 
landscape (both thermodynamics and kinetics) of ligand dissoci- 
ation [160]. The dissociation kinetics were characterized as a 
simple (i.e., single-exponential) time dependence with multiple 
dissociation pathways. A computational methodology using 
trajectory data from multiple Brownian dynamics simulations of 
ligand diffusion has been developed for characterizing the kinetics 
of drug-receptor interactions in terms of the encounter complex 
[161]. A computational approach named metadynamics has been 
used both for reconstructing the free energy and for accelerating 
rare events in systems described by complex Hamiltonians, at the 
classical or at the quantum level [162]. All-atom metadynamics 
simulations of a peptide substrate interacting with wild- type HIV-1 
protease in explicit solvent rendered accurate calculations of 
binding affinity and kinetics constant compared to the experi- 
mental data [163]. 

Ultimately, drug-target interactions and genetic events should 
be studied in the context of biological networks. Existing biological 
network analysis is conformationally stateless. Thus, these 
networks are not sufficient to model the influence of protein 
dynamics on the drug response phenotype. The integrative 
modeling depicted in Figures 2 and 3 provides a possible solution 
to incorporating conformational dynamics into network modeling. 
It is worth mentioning that this integrative modeling framework 
could also be a powerful tool in studying the pharmacodynamics of 
drug— drug interactions. Due to the robust nature of biological 
systems, drug combinations are often necessary and proven to be 
successful in treating complex diseases and combating drug 
resistance [164—165]. However, the Adverse Drug Reaction 
(ADR) resulting from drug-drug interaction is a serious problem 
in developing combination therapies. In addition to the pharma- 
cokinetics (see next section), the pharmacodynamics of the drug- 
drug interaction may play a critical role in the ADR [166]. 
Existing state-of-the-art methods for the prediction of drug-drug 
interactions are data-driven, which mainly establish statistical 
association from empirical observations but provide little infor- 
mation on the mechanism of drug-drug interaction. Thus, they 
may not be sufficient for the predictive modeling of drug-drug 
interactions during the early stages of drug development [167- 
1 68] . Using the proposed integrative modeling framework, it may 
be possible to predict drug-drug interactions de novo. 

Pharmacokinetics in the Era of Structural Systems 
Pharmacology 

The Absorption, Distribution, Metabolism, and Excretion 
(ADME) properties of a drug, i.e., absorption and distribution to 
its target(s), detoxification by metabolism and excretion of the drug 
from the human body, are the primary concerns of pharmaco- 
kinetics. At the molecular level, ADME properties are strongly 
influenced by the abundance and activity of transporters and 
metabolizing enzymes such as CYP450, UDP-glucuronosyltrans- 
ferases, sulfotransferases, N-acetyltransferases, glutathione S-trans- 
ferases, and methyltransferases. Much effort has gone into 
developing computational methods for in silico prediction of 
ADME properties. These methods initially only addressed the 
small drug molecule. Based on chemical structures, quantitative 
structure-activity relationship (QSAR)-based approaches have 
been extensively used to correlate the physiochemical properties 
of lead molecules with their ADME profiles [169-170]. Shifting 
from the ligand to the receptor, structure-based methods have 
been developed which leverage the ever-increasing number of 3-D 



structures of ADME related proteins. Similarity searches and 
traditional pharmacophore approaches are enhanced by more 
advanced molecular descriptors and 3-D pharmacophores that 
encode the details of ligand-target binding [171]. Dynamic 
properties of ligand-target binding could be incorporated into 
the pharmacophore with conformational sampling techniques. 
Protein-ligand docking based on virtual screening for millions of 
compounds can now be accomplished with ease [172]. However, 
applying molecular docking to ADME related proteins is 
complicated by the existence of large and flexible binding cavities 
in CYP450 and phase II metabolizing enzymes, which can 
accommodate more than one ligand [172]. Consequendy, the 
correlation between the docking score obtained for the best poses 
with experimentally determined binding free energies is usually 
poor. Nevertheless, in a recent study, Schlessinger et al. used a 
homology model of a norepinephrine transporter and molecular 
docking to successfully predict the prescription drugs which 
specifically bind to it [173]. With the availability of data from 
chemical genomics and high-throughput screening [28,174], the 
combination of multiple flexible docking tools with chemoinfor- 
matics may boost the performance of structure-based virtual 
screening [175]. Although more accurate in deriving binding free 
energy, a more rigorous thermodynamic approach is unfortunately 
more computationally demanding and not applicable to large- 
scale virtual screening. Besides molecular docking based virtual 
screening or molecular dynamics simulation, quantum mechanical 
and hybrid quantum mechanical/molecular mechanical (QM/ 
MM) methods have emerged as powerful tools for modeling 
reaction rates of drug metabolism. The whole reaction profile for 
benzene hydroxylation by CYP2C9 was studied with such a 
hybrid approach; a combined docking-MD-QM calculation was 
used to simulate the activation energy of CYP3A4 [176]. The 
challenge is to extend this technique to a structural proteome scale, 
as discussed in the previous sections. 

So far, pharmacogenomics prediction of ADME properties has 
mainly focused on the genotypic variations and polymorphisms in 
metabolizing enzymes — the overall contribution of pharmacoge- 
nomics to personalized medicine remains limited [177]. The 
pharmacokinetics of a drug is the interplay between the inherent 
physiochemical properties of the drug and its physiological 
environment. The expression of metabolizing enzymes and 
transporters is highly regulated by multiple factors, including 
genetic polymorphisms, xenobiotics induction, cytokines, hor- 
mones, and pathophysiological states, as well as gender and age of 
families [178]. Multi-allelic genetic polymorphisms depend signif- 
icantly on ethnicity and imply disparate clinical phenotypes 
including ADR, drug efficacy, drug resistance, and dose require- 
ment. A mechanistic understanding of these regulators is essential 
for the predictive modeling of pharmacokinetics. It is well 
established that CYP genes are direcdy regulated by nuclear 
receptors [179]. Multiple genes, such as p53, AP-1, Ras, and APC, 
are involved in the regulation of multiple drug-resistance 
transporters (ABC transporters) [180]. If a drug itself, or another 
drug, interacts with these genes, undesirable pharmacokinetics 
profiles or drug-drug interactions may arise. Thus, the pharma- 
cokinetics regulatory genes are potential drug off-targets that affect 
the fate of drug molecules in vivo. We call them "pharmacoki- 
netics off-targets" to distinguish them from those related to 
pharmacodynamics. The fact that the activity of direct pharma- 
cokinetics regulatory genes can be modified by their upstream 
genes adds a new layer of complexity to the problem. Therefore, to 
fully understand the molecular mechanisms underlying the 
ADME properties of a drug, it is necessary to identify the 
pharmacokinetics off-targets as well as the regulatory network of 
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pharmacokinetics genes on a proteome scale. Figure 5 shows a 
regulatory pathway of CYP3A, whose substrates include several 
hundred drugs [181]. LCMT1 is a methyltransferase that 
methylates the PP2A catalytic subunit and promotes its functional 
association with the PP2A regulatory subunits. PP2A is a major 
protein phosphatase that dephosphorylates PRMT1, thus inhib- 
iting PRMT1 enzymatic activity. PRMT1 is essential for the PXR 
transcription process. PXR dimerises with RXR to induce the 
gene expression of CYP3A. It is clear that genetic variations or 
drug perturbations on any one of the genes along this pathway 
may affect the abundance and activity of C YP3A, thereby leading 
to a change in the ADME properties. Using a structure-based off- 
target identification pipeline, LCMT1 has been identified as the 
off-target of several antibiotics [182], highlighting the potential 
power of proteome-scale structural modeling in predicting novel 
pharmacokinetics profiles and drug-drug interactions. 

Noncoding DNA may play important roles in the regulation of 
transporters and metabolizing enzymes. For example, the CYP 
family includes 58 pseudogenes that do not encode functional 
protein [183]. An increasing body of evidence suggests that 
pseudogenes have diverse functions that influence not only their 
parent genes but also apparently unrelated genes [184]. For 
example, one of the CYP450 genes, CYP2A6, has a pseudogene, 
CYP2A7. CYP2A7 may transfer a fragment of DNA to its parent 
gene CYP2A6, leading to a change in its sequence. It is observed 
that individuals who smoke have a mutated gene C YP2 A6* 1 B that 
is converted from a CYP2A7 polymorphism. CYP2A6*1B 
stabilizes its mRNA, thereby increasing first its expression level, 
then its activity in metabolizing nicotine [185]. The functional 
roles of most pseudogenes still remain elusive. Structural systems 
biology, as discussed in the next section, may shed new light on the 
functional relevance of noncoding DNA in pharmacokinetics. 

Drug metabolism is strongly dependent on the physiological 
state (e.g., obesity and diabetes) and environment (e.g., gut- 
microbiome). The prediction of inter-individual pharmacokinetics 
variation requires the coupling of pharmacogenomics and 
pharmacometabonomics [186]. In principle, mechanism-based 
modeling of drug interactions with transporters and metabolizing 
enzymes can be integrated with pharmacogenomics, pharmaco- 
metabonomics, and other omics data using the integrative 
modeling framework proposed in the previous section. Physiolog- 
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Figure 5. A proposed pathway that modulates the abundance 
and activity of CYP3A. LCMT1 is a potential off-target for antibiotics. 
The inhibition of LCMT1 will activate PXR, thereby increasing the 
activity of CYP3A. 
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ically based pharmacokinetics modeling has been developed in the 
past four decades. Although it is resource consuming in obtaining 
input parameters, progress made in in silico technologies has 
greatly facilitated the prediction of oral absorption and hepatic 
metabolism as well as mechanistic models of tissue distribution 
based on pharmacokinetics models. The botdeneck for physiolog- 
ically based pharmacokinetics modeling resides in the limited data 
available. As discussed in the previous section, dynamic simula- 
tions involving comprehensive metabolic pathways, signaling 
pathways, and cell physiology are being studied together with 
multiscale modeling at the cellular, organ, and whole-body levels 
[187]. The sparse number of available kinetic parameters calls for 
more structure-based modeling efforts in order to enable further 
multiscale systems pharmacology analysis. 

Pharmacogenomics and GWAS in the Era of 
Structural Systems Pharmacology 

Recent advances in pharmacogenetics and pharmacogenomics 
have identified genetic variants in several hundreds genes, notably 
drug metabolizing enzymes (e.g., CYP450), transporters, and drug 
targets [188]. The knowledge derived from such data has already 
resulted in individualized therapy. For example, the appropriate 
initial dose of the anticoagulation drug warfarin can be estimated 
using a pharmacogenomics algorithm [189]. Similarly, certain 
mutations can be used to predict alternative responsiveness to 
drugs in cancer therapy [190-194]. Acknowledgment of the 
emerging role of pharmacogenomics can be found in the labels of 
the Food and Drug Administration (FDA)-approved drugs 
montelukast [195] and cetuximab [196] and in FDA-approved 
diagnostic tests, for example, the microarray-based Roche 
AmpliChip for cytochrome P450 polymorphisms and the Invader 
UGT1A1 Molecular Assay for detecting polymorphisms that 
increase the risk of neutropenia when using the colon cancer drug 
irinotecan [15,197]. 

In spite of these advances, much more is needed for predictive 
modeling of individual drug responses. It is critical to understand 
the impact of genetic variation beyond direct drug interactions 
with on-targets, off-targets, metabolizing enzymes, and transport- 
ers. Many nontarget-associated genetic factors may affect the drug 
response phenotype. As shown in Figure 1, if a critical node, edge, 
or feedback loop is modified in a drug modulation pathway, the 
drug efficacy or side effect profile can change accordingly. One 
example is the mutation of K-RAS located downstream of the 
EGFR pathway, which causes resistance to the anticancer activity 
of EGFR inhibitors [198-199]. In another situation, genes aside 
from the drug target may regulate the same biological pathway 
where the system-level drug response is the result of their 
combinatorial control. The mutation or expression changes of 
such genes may enhance or reduce the drug response. The gain- or 
loss-of-function of mutations that are associated with drug action 
may come from genetic variations in both coding and noncoding 
regions. Recendy, the 1,000 Genomes Project has identified 38 
million single nucleotide polymorphisms (SNPs), 1.4 million short 
insertions and deletions, and more than 14,000 larger deletions 
from 1,092 individuals belonging to 14 ethnic groups. The 
individual-specific, rare, coding variants are located across a broad 
array of biological pathways. Moreover, there are hundreds of 
functionally annotated, rare, noncoding variants for each individ- 
ual [200]. It is expected that these variants will alter the 
pharmacokinetics, pharmacodynamics, and responsiveness to drug 
therapy [15]. The extremely large, multidimensional datasets from 
these studies present an exciting opportunity to expand the 
horizon of pharmacogenomics by identifying causal variants and 
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genes, and predicting pathways likely to be involved in drug 
response. The challenge is how to associate these data with a 
predicted drug response a priori. 

Many disease-associated variants and drug-response cryptic 
genetic factors remain uncharacterized; hence, new methods are 
urgently needed to annotate DNA variants, their functional roles, 
and their associations with drug actions. How to annotate the 
functional roles of DNA variants, especially for noncoding 
variants, is a challenge because of the diversity of noncoding 
functions, the incomplete annotation of regulatory elements, and 
unknown mechanisms of regulatory control. Several large-scale 
studies have been performed to annotate the noncoding genome 
and regulatory elements. These studies integrate the high- 
throughput functional genomics and comparative genomics 
datasets [3-5] to map the functional noncoding elements on a 
genome-wide scale. Variants on these elements can result in the 
different regulation of their target genes. For example, studies from 
the Encyclopedia of DNA elements (ENCODE) [201] and model 
organism Encyclopedia of DNA elements (modENCODE) [202- 
204] projects provide comprehensive maps of transcription 
binding for select cell lines and DNase maps for many primary 
cells and highlight the importance of noncoding DNAs in the 
regulation of complex phenotypes. With known functional 
elements and motifs, methods have been developed to predict 
the effect of newly observed rare and private mutations by 
integrating models of sequence motifs, chromatin states, and 
expression patterns in model organisms and in cultured human 
cells [205-208]. For example, quantitative sequence-activity 
models (QSAMs) [209] are trained based on these data in a 
massively parallel reporter assay (MPRA) developed to enable 
systematic dissection and optimization of transcriptional regulato- 
ry elements [208] . 

Since annotated pharmacogenomics biomarkers are incomplete 
and biased, the conventional "guilt-by-association" approach may 
not be sufficient to identify novel drug-response genetic markers 
[43] . The power of statistical analysis of pharmacogenomics and 
GWAS data is often limited by the moderate effect size of samples. 
As a result, a number of rare variants could be missed. 
Fundamentally, the genotype-phenotype relations established by 
statistical or machine learning approaches may be merely 
mathematical correlations. Macromolecular structures and their 
interactions may provide critical mechanistic insight into the 
functional roles of DNA variants and their impact on drug action. 
The modeling and analysis of macromolecular structures has 
already made significant contributions to our understanding of 
how mutations affect the stability, folding, and binding of 
macromolecules [210-211]. Several recent structure-based studies 
have provided high-resolution pictures for how variants rewire 
biological networks through allosteric regulation, protein-protein 
interaction (PPI), and protein-nucleic acid interaction (PNI) 
[59,140,212-213]. Kowarsch et al. showed that co-evolving 
residues can influence each other through allosteric regulation 
and are significantly more likely to be disease-associated than 
expected by chance [140]. By mapping the mutations on the 
structures, Wang et al. found that in-frame mutations are enriched 
on the interaction interfaces of proteins associated with the 
corresponding diseases and that the disease specificity for different 
mutations can be explained by their location within an interface 
[59]. Similar findings were observed in David et al.'s work [213] 
by combining a database of the 3-D structures of human protein/ 
protein complexes [214] and the humsavar database of non- 
synonymous Single Nucleotide Polymorphisms (nsSNPs) [215]. 

As shown in Figure 1, proteome-wide drug-target interactions 
and genome-wide genetic variants may collectively affect the 



functional state of complex biological networks that mediate the 
system-level response to the drug, leading to both therapeutic and 
adverse effects. Few computational tools are able to model the 
collective effects of drug perturbation and genetic variants in the 
context of the whole human genome and biological network, which 
is essential for the development of personalized medicines. Pathway 
analysis such as gene-set enrichment analysis (GSEA) [216] can be 
applied to pharmacogenomics data to concentrate the genetic 
perturbation along the annotated biological pathways. Another way 
to use prior knowledge of gene interrelationships is to incorporate 
the information into the association study itself through Bayesian 
techniques [2 1 7] or by using boosting to prioritize disease networks 
[218]. By integrating the atomic details of molecular interactions, 
co-evolution information, protein-protein interaction networks, 
transcriptional profiles, and pathway enrichment analysis, Xie et al. 
developed a structural systems biology approach to identifying the 
functional role of DNA variants and causal mutations with an 
extremely small sample size [212]. Through this approach, the 
driver mutations that confer hypoxia tolerance in Drosophila 
melanogaster were identified. Furthermore, the functional roles of 
several nsSNPs, which are predominantly involved in allosteric 
regulation, protein-protein interaction, and protein-nucleic acid 
interaction, were determined [212]. The power of variation- 
mediated pathway analysis can be further enhanced by incorpo- 
rating other regulatory and signaling network components, such as 
microRNA-target interactions, protein-nucleic acid interactions, 
and phosphorylation events, etc., and taking advantage of advanced 
graph mining algorithms [219-220]. 

The systematic perturbation of sequence variants can be 
introduced into the dynamics simulation of pathways and 
genome-scale modeling of biological networks through macromo- 
lecular structures. Recently, Cheng et al. developed a computa- 
tional framework to integrate missense mutation, protein struc- 
tural modeling, and ODE [221]. They introduced the Systemic 
Impact Factor (SIF) as a measurement of phenotype changes 
resulting from the mutation. SIF is a function of the free energy 
change caused by the mutation and systemic control coefficient. 
The free energy change in a protein directly leads to the change in 
its kinetic parameters. The control coefficient quantifies the 
sensitivity of the phenotype readout to the change in kinetic 
parameters. They tested their models on two cases: G2-M 
transition control in yeast and the human mitogen-activated 
protein kinase (MAPK) pathway. The SIF from the simulation is 
well correlated with the experimental results. The missense 
mutations in this study are mainly associated with protein stability. 
In the future, it will be interesting to quantify the systematic 
impact of broad types of mutations that modify allosteric 
regulation, molecular interaction, and gene expression. Chang et 
al. have integrated a reconstructed kidney model with structural 
bioinformatics and molecular modeling to predict the side effect 
profile of cholesterylester transfer protein (CETP) inhibitors and to 
identify genetic risk factors that cause adverse drug reactions [71]. 
An interesting finding from this study is the synthetic effect of 
drug-target interactions and nontarget-associated genetic modifi- 
cations. Serious side effects are caused by the combination of the 
drug treatment and the genetic alteration but not by either alone. 
It is anticipated that the identification of nontarget genetic factors 
that affect drug action will have a significant impact on 
personalized medicine. 

Conclusion 

The holistic, adaptive, and evolving nature of biological systems 
makes the quest for a simple and elegant mechanistic model to 
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explain and predict biological phenomena less than fruitful. At the 
same time, the emergence of big data should significandy shift our 
approach to biomedical research. Given these data and increased 
computer power, useful predictive statistical models are possible. 
The question then becomes, how can we discover new knowledge 
from these statistical models under conditions that are constandy 
changing? 

A complex disease state is not static under drug treatment, but 
evolves to new states in adapting to the drug-induced environ- 
ment. When enough data are collected to describe one disease 
state and a successful model can be built, the disease may already 
be different from the one used to build the model. In such a 
temporal situation, a data-driven model is essentially retrospective 
but not prospective. Another major question is how much data are 
enough to build an accurate predictive model given the genetic, 
epigenetic, and clinical heterogeneity of complex diseases? Will the 
model still work if the individual has an unobserved new mutation? 
Moreover, genetic and/ or epigenetic events and drug actions are 
rooted in the fundamental principle of physics and chemistry. 
Indifference to the detailed physical and chemical nature of 
biological processes in the modeling of big biological data could 
eventually hinder scientific advances in biomedicine. Discovery of 
new knowledge requires more than just a query of a big reference 
table built from data. Macromolecular structure plays an 
irreplaceable role in linking the physical and chemical origins of 
genetic events and drug action to the systematic response at the 
cellular, tissue, and organism levels. Thus, the incorporation of 
physiochemical-based macromolecular structure modeling with 
data-driven and mathematical-based pharmacodynamics, phar- 
macokinetics, pharmacogenomics, and systems pharmacology will 
not only enhance the power of modeling a predictive personalized 
drug response but will also shed new light on our understanding of 
living systems in a broad sense. 
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